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Abstract. Community definitions usually focus on edges, inside and between the 
communities. However, the high density of edges within a community determines 
correlations between nodes going beyond nearest-neighbours, and which are indicated 
by the presence of motifs. We show how motifs can be used to define general classes of 
nodes, including communities, by extending the mathematical expression of Newman- 
Girvan modularity. We construct then a general framework and apply it to some 
synthetic and real networks. 
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1. Introduction 

Modular structure in complex networks has become a challenging subject of study 
starting with its very definition pQ. One of the most successful approaches has been 
the introduction of the quality function called modularity [2l [3] , that accomplishes two 
goals: (i) it implicitly defines modules, and (ii) it provides with a quantitative measure 
to find them. It is based on the intuitive idea that random networks are not expected 
to exhibit modular structure (communities) beyond fluctuations. 

A lot of work has been done to devise reliable techniques to maximize modularity [4, 
El El [71 El [9] . However, very little has been done to analyze the concept of modularity 
itself and its reliability as a method for community detection. To a large extent, the 
success of modularity as a quality function to analyze the modular structure of complex 
networks, relies on its intrinsic simplicity. The researcher interested in this analysis is 
endowed with a non-parametric function to be optimized: modularity. The result of the 
analysis will provide a partition of the network into communities such that the number 
of edges within each community is larger than the number of edges one would expect to 
find by random chance. As a consequence, each community is a subset of nodes more 
connected between them than with the rest of the nodes in the network. Recently, it has 
been shown that modularity is not the panacea of the community detection problem; in 
particular it suffers from a resolution limit that avoids grasping the modular structure of 
networks at low scales [10] . Moreover, modularity is strongly focused on communities, so 
it cannot be used in general to detect groups of nodes revealed by alternative connectivity 
patterns. The only exception is represented by "anti-communities" , i.e. groups of nodes 
with a few edges inside and many edges connecting different groups. The presence 
of anti-communities indicates that the network has a multipartite structure. Anti- 
communities could be detected by modularity minimization [11], although the results 
are not so good, as we mention in section [31 

In general, detecting multipartite structure from first principles requires a definition 
of the classes that is quite different (in fact, opposite) with respect to standard 
community definitions. Let us consider bipartite networks, where nodes/actors are 
connected through other entities, for example collaboration in a work, attendance to 
an event, etc. In these specific cases, nodes of the same class (e.g. actors) are not 
directly linked, or share but a few edges, and usually some projection of the network 
in a subnetwork of only a class of nodes is needed for subsequent analysis. However 
any projection implies knowledge about the different classes of nodes. The definition of 
community must be generalized to deal with these cases. Doing it within a modularity- 
based framework requires a different formulation of modularity [12, [13] . 

We remark that bipartite networks are characterized by the fact that any path 
with even length starting from a node of either class ends in the same class, due to the 
absence of internal edges in each class. So, if the two classes are A and B and we start 
from a node of class A, the first step leads to one of its neighbours, say ib, which is 
in B, the next step to a neighbour 2a of z_b, which is in A, and so on. In this way, paths 
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of even length starting and ending in the same class may reveal bipartite structure, if 
there are many of them. On the other hand, in a graph with modular structure, there 
are many edges inside each module, so one expects accordingly a large number of paths 
between the nodes. In particular, one expects a large number of cycles, i.e. closed paths. 

We deduce that short paths, or motifs, of a network, could be used to define and 
identify both communities and more general topological classes of nodes. Here we 
propose a general framework to classify nodes based on motifs. Classes will be defined 
based on the principle that they "contain" more motifs than a null model representing 
a randomized version of the network at study. We adopt the null model of modularity, 
i.e. a random network with the same degree/strength sequence of the original network, 
because modularity lends itself to a simple generalization, which makes calculations 
straightforward. We shall derive different extensions of modularity, where the building 
blocks will be the motifs and not just the edges, as in the original expression. After 
that, we shall maximize the new functions to detect the classes. 

We stress that we use a modularity-based framework only as an illustrative example 
of how motifs could be defined to detect general node classes in networks, but in general 
our framework can be useful to any other method designed to detect substructure in 
networks. Note that the extended quality functions, that we shall introduce, also obey 
the principle of the resolution limit, which states that modularity will not be able to 
resolve substructures beyond a certain size limit, just like the original modularity [10J. 
However this limit is now motif-dependent and then several resolution of substructures 
can be achieved by changing the motif. 

The rest of the paper is structured as follows: in the next section we present the 
mathematical formalism of the generalized modularities; then, we test the framework 
on synthetic and real networks; finally we discuss the results obtained. 



2. Mathematical formulation of motif modularity 

The original definition of modularity by Newman and Girvan [2] only deals with 
unweighted and undirected networks. Later on, Newman generalized it to cope with 
weighted networks [3] . In this work we start from an extension of modularity to weighted 
directed networks [H], which reduces to the previous one for undirected networks, and 
which is calculated as follows: 

-, N N / out in \ 

Q(c) = ^ E E - -^f- ) m, ci) , (2.1) 

where Wij is the weight of the connection from the ith to the jth node, w° ut = J2j w ij 
and Wj 1 = J2i w ij stand for their output and input strengths respectively, 2w = w ij is 
the total strength of the network, d is the index of the community which node i belongs 
to, and the Kronecker S is 1 if nodes i and j are in the same community, otherwise. 
For undirected networks, the only change is that wf nt = wf 1 = The larger the value 
of modularity, the better the corresponding partition of the network into modules. 
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In the next subsections we develop the mathematical formulation of a motif 
modularity which generalizes the standard one in (12.11) . First, the most general 
framework is explained, and then the formalism is applied to several classes of motifs. 

2.1. General motif modularity 

Let M. = (Vm,Em) be a motif (connected undirected graph, or weakly connected 
directed graph), where Vm is the set of M nodes of the motif, and Em C V m x Vm is 
the set of its edges. 

Let {wij > | i, j = 1, . . . , N} be the weights of a (directed or undirected) network 
of iV nodes, where w^j = if there is no edge from the ith to the jth node, and 
Wij G {0,1} if the network is unweighted. The nodes of the motif will be labeled by the 
indices i\, i<i, ■ ■ ■ , im, all of them running between 1 and N. 

Given a certain partition C of an unweighted network in communities, the number 
of motifs fully included within the communities is given by 

JV JV N 

*A4(C) = £ E-"- E II ^ a iJ(C ia ,C ib ), (2.2) 

ii=l i 2 =l iM=l (a,b)<=E M 

Degenerated motifs, i.e. those where some nodes are counted more than once, are 
included in this sum. The formula also holds for weighted networks, which can be 
inferred from the mapping between weighted networks and unweighted multigraphs [3]. 

The maximum value of ^> m{C) corresponds to the partition in a single community 
containing all the nodes: 

N N N 

^=EE-E II «w ( 2 - 3 ) 

ti =1*3=1 »Af=l (a,b)eE M 

For a random network preserving the nodes' strengths, these quantities are 
respectively 

N N N 

n M (c) = E E ••• E II <T<J(c ia ,c ib ) (2.4) 

il =1*2=1 «M=1 (a,b)£E M 

and 

N N N 

^=EE-E II OS- ( 2 - 5 ) 

U=l *2=1 «A/=1 (a,b)eE M 

Now, by analogy with the standard modularity, we define the motif modularity as 
the fraction of motifs inside the communities minus the fraction in a random network 
which preserves the nodes' strengths: 

Qm{C) = — . (2.6) 

The introduction of nullcase weights riy , masked weights Wij(C) and masked nullcase 
weights Uij(C), 

=w° ut wf 1 (2.7) 
w ij (C) = w ij 5(C i ,C j ), (2.8) 
n ij (C)=m j 5(Ci,C j ), (2.9) 
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allows the simplification of the previous expressions, in particular motif modularity: 

e n w ^°) e n "wb(c) 

„ U«2-«M (a,b)€E M W2-i M {a,b)&E M , , 

E 11 ^ E 11 ^-h 

ill2— tM (a,fe)Gi?7n ilil—lM (a,b)£EM 

Motif modularity may be further generalized by relaxing the condition that all nodes 
of the motif should be fully inside the modules. This is done just by removing some of 
the maskings in (I2.10p as required, and possibly with the addition of some Kronecker 8 
functions between non-adjacent nodes of the motif. In this way, it is possible to define 
classes of nodes different from communities, as we shall see in subsection 12.31 



2.2. Cycle modularity 

Among the simplest possible motifs, triangles are the ones which have deserved more 
attention in the networks literature. For instance, it has been shown that real networks 
have higher clustering coefficients than expected in random networks [32]. Thus, it 
would be desirable to be able to find "communities of triangles". Our approach 
consists in the definition of a triangle modularity Qa(C), based on the triangular motif 
E A = {(1, 2), (2, 3), (3, 1)}, which reads: 

E Wij(C)w jk (C)w ki (C) E n i:j (C)n jk (C)n ki (C) 

Q A (Q = ^— ^— . (2.11) 

2_ WijWjkWki 2^ ">j n .ji."'i 

ijk ijk 

Triangle modularity is trivially generalizable to cycles of length £, making use of 
the cyclical motif E C (t) — {(1, 2), (2, 3), . . . , (£— 1, £), (£, 1)}. The number of these motifs 
within the communities is given by 

*cw(C)= E w hi2 {C)w i2i3 {C) ■ ■ (C)w kh (C). (2.12) 

iiil—it 

The full formula for the cycle modularity Qcw{C) follows immediately from it. 

If the network is directed, other non-cyclical motifs exist. We skip them, since their 
derivation is straightforward. 



2.3. Path modularity 

A path V {E) of length £ is simply the linear motif E pW = {(1, 2), (2, 3), ... , (£, £+1)}. 
We remark that cycles are closed paths, but here we shall only consider open paths. 
The number of paths of length £ fully inside the communities is given by 

*WC)= E w lll2 (C)w i2i ,(C)---w leie+1 (C). (2.13) 
W2—H+1 

Note that this expression equals the sum of the components of the £th power of the 
masked weight matrix. 
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The path of length £ = 1 corresponds to the simplest motif E vm = {(1,2)}, which 
is just a single edge, so its motif modularity (12.101) equals the standard definition of 
modularity (12.11) . 

Paths of length 2 are also useful for the analysis of bipartite networks, provided 
one removes the constraint that all nodes of the path belong to the same module. If 
one allows that the middle node of a path of length 2 could be any node of the network, 
whereas the first and third nodes are kept within the same group, the path can be used 
to discover relationships between nodes of different groups. If a network is bipartite, for 
instance, there will be many paths of length 2 starting from a class and returning to it 
from the other class. If only the extremes of the path are required to be inside the 
community, their total number is given by 

S(C h ,C k+1 ). (2.14) 

In this case, the calculation makes use of the £th power of the weight matrix (instead of 
the masked weight matrix), and the masking is applied to the sum of their components. 

3. Examples and tests 

When one is faced with the problem of community detection in a particular network, 
the first thing to do should be to answer the following question: what sort of 
connectivity patterns or motifs are pertinent in this study? According to the answer, 
it is straightforward to select one of the possible motif modularities. We present in this 
section examples of the application of the previous framework to two synthetic networks. 
Finally, we perform two tests on real networks for which the real partitions observed are 
known. 

The synthetic networks that we have generated for this purpose are the clique & 
circle network and the star network. In figure Q] we show these networks as well as the 
classes found using different motif modularities. Suppose we want to find node classes 
by means of triangles. When we optimize the triangle modularity for the clique & circle 
network, the clique forms a community whereas the nodes of the circle are separated 
into five singleton communities. This is due to the absence of triangles within the circle. 
On the contrary, the standard modularity identifies the circle as a community. 

The second example, the star network, is a case where the path motifs prove to be 
useful. This network can be seen as a simple bipartite network with eight actors (the 
leaf nodes) and just one event (the hub node). In this case, recalling what we have said 
in the previous section, the path modularity of length 2 with a free intermediate node is 
the proper motif modularity to use. The results confirm that the star is decomposed in 
two classes, one for the leaves and another for the hub. The same partition is obtained 
for any even path length with free intermediate nodes, while for odd path lengths all 
nodes are joined in a single community. This holds as well if one maximizes the standard 
modularity; however, the correct partition of the network can be recovered by modularity 
minimization. 



Motif-based communities in complex networks 7 
(a) (b) 




Figure 1. Results for two synthetic networks: (a) Clique & circle network, with 
triangle modularity; (b) Star network, with paths of size 2 modularity with free 
intermediate node (see text for details) . Members of the same class are depicted using 
equal symbol and color. 

The real networks used for testing are the Zachary Karate Club network p2] and the 
Southern Women Event Participation network [T7J [18]. A description of each network 
can be found in their respective references. For the mathematical analysis presented 
here the interesting fact regarding these networks is that we know the real splittings 
occurred in the Zachary network, as well as the most plausible classification assigned in 
the literature to the Women Event Participation data, as reported by Freeman [IB] . In 
figure [2] we show both networks as well as their respective partitions. 

For the Zachary Karate Club network, the nature of the data suggests to try an 
optimization of path modularities, since the decision of following any of the two leaders 
during the splitting of the club surely depended on higher order friendship relationships 
(friends of friends, and so on). When a path modularity of length 1 is considered (i.e. the 
classical definition of modularity), the best partition obtained splits each one of the two 
real communities into two sub-communities, yielding a partition in four communities. 
But when one looks for a more compact structure of the communities, which can be 
accomplished by increasing the length of the paths, the optimization of path modularity 
delivers the real splitting observed, for all path lengths we have used (from 2 to 6). The 
same result is obtained when the paths are replaced by cycles (lengths from 4 to 9). 
Triangles give almost the exact partition, but with two exceptions: nodes 10 and 12 
become isolated, because they do not belong to any triangle. 

The second network tested is a multipartite network. In this well as for 

the star network, the use of path modularity of length 2 with a free intermediate node 
is crucial, and it accounts for the role differentiation between women and events. The 
results not only reveal the two roles of events and women, but also recover their internal 
split according to their participation in events, a classification made by social scientists 
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Figure 2. Results for two real networks: (a) Zachary Karate Club network. We 
depict the real splitting obtained when using several path and cycle modularities; (b) 
Southern Women Event Participation network. We depict the results of the analysis 
of this multipartite network without any projection, simply applying modularity of 
path free intermediate of length 2. Remarkably the results show clearly the role 
differentiation of women and events, as well as the splitting of women according to 
the events participation that has been reported in the literature. 

[T8] (with the same exception of one woman, as in the weighted projection and bipartite 
methods in [12]). In this case, the minimization of standard modularity is only able to 
separate women and events, with no further subdivision. 

4. Conclusions 

In this work we have shown that a general classification of node groups in networks is 
possible if one uses motifs as elementary units, instead of simple edges. To show that, 
we generalized Newman-Girvan modularity by replacing edges with motifs. The new 
versions of modularity obtained have been tested on synthetic and real networks, and are 
able to recover expected connectivity patterns in networks, both when the networks have 
modular structure and when they have multipartite structure. However, the principle 
goes beyond the use of modularity and could inspire promising alternative frameworks. 
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